LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

نویسندگان

Huizhen Yu

Dimitri P. Bertsekas

چکیده

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the standard policy iteration approach is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm solves an optimal stopping problem inexactly with a finite number of value iterations. The main advantage over the standard Qlearning approach is lower overhead: most iterations do not require a minimization over all controls, in the spirit of modified policy iteration. We prove the convergence of asynchronous deterministic and stochastic lookup table implementations of our method for undiscounted, total cost stochastic shortest path problems. These implementations overcome some of the traditional convergence difficulties of asynchronous modified policy iteration, and provide policy iterationlike alternative Q-learning schemes with as reliable convergence as classical Q-learning. We also discuss methods that use basis function approximations of Q-factors and we give an associated error bound. Sep 2011; revised Mar 2012 ∗Work supported by the Air Force Grant FA9550-10-1-0412 and by NSF Grant ECCS-0801549. †Huizhen Yu is with the Lab. for Information and Decision Systems, M.I.T., Cambridge, Mass., 02139. janey [email protected] ‡Dimitri Bertsekas is with the Dept. of Electr. Engineering and Comp. Science, and the Lab. for Information and Decision Systems, M.I.T., Cambridge, Mass., 02139. [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Q-learning and policy iteration algorithms for stochastic shortest path problems

متن کامل

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other ...

متن کامل

Stochastic Shortest Path Games and Q-Learning

We consider a class of two-player zero-sum stochastic games with finite state and compact control spaces, which we call stochastic shortest path (SSP) games. They are total cost stochastic dynamic games that have a cost-free termination state. Based on their close connection to singleplayer SSP problems, we introduce model conditions that characterize a general subclass of these games that have...

متن کامل

Stochastic approximation for non-expansive maps : application to Q-learning algorithms

We discuss synchronous and asynchronous iterations of the form x = x + γ(k)(h(x) + w), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark’s lemma for the synchronous case or on...

متن کامل

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems

We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

نویسندگان

چکیده

منابع مشابه

Q-learning and policy iteration algorithms for stochastic shortest path problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Stochastic Shortest Path Games and Q-Learning

Stochastic approximation for non-expansive maps : application to Q-learning algorithms

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems

عنوان ژورنال:

اشتراک گذاری